Identification of Candidate Biomarkers for Idiopathic Thrombocytopenic Purpura by Bioinformatics Analysis of Microarray Data

Idiopathic Thrombocytopenic Purpura (ITP) is a multifactorial disease with decreased count of platelet that can lead to bruising and bleeding manifestations. This study was intended to identify critical genes associated with chronic ITP. The gene expression profile GSE46922 was downloaded from the Gene Expression Omnibus database to recognize Differentially Expressed Genes (DEGs) by R software. Gene ontology and pathway analyses were performed by DAVID. The biological network was constructed using the Cytoscape. Molecular Complex Detection (MCODE) was applied for detecting module analysis. Transcription factors were identified by the PANTHER classification system database and the gene regulatory network was constructed by Cytoscape. One hundred thirty-two DEGs were screened from comparison newly diagnosed ITP than chronic ITP. Biological process analysis revealed that the DEGs were enriched in terms of positive regulation of autophagy and prohibiting apoptosis in the chronic phase. KEGG pathway analysis showed that the DEGs were enriched in the ErbB signaling pathway, mRNA surveillance pathway, Estrogen signaling pathway, and Notch signaling pathway. Additionally, the biological network was established, and five modules were extracted from the network. ARRB1, VIM, SF1, BUB3, GRK5, and RHOG were detected as hub genes that also belonged to the modules. SF1 also was identified as a hub-TF gene. To sum up, microarray data analysis could perform a panel of genes that provides new clues for diagnosing chronic ITP.


Introduction
Immune thrombocytopenic purpura (ITP) known as Idiopathic thrombocytopenic purpura is a multifactorial autoimmune bleeding disease associated with platelet destruction and discriminated by isolated thrombocytopenia (platelet count < 150,000 u/L) that was reported in almost 2 per 100,000 adults with a mean age of diagnosis of 50 years (1,2). However, the vague pathogenesis, the abnormalities in the number and the function of different immune cells can play a crucial role in this disease. ITP phenotype, characterized by dysfunctional T-lymphocyte immunity, dysregulation in pre-B-cell, and T cell immunophenotypic markers, was recognized in bone marrow lymphocytes of pediatric ITP (3,4). Besides, it is believed that membrane glycoproteins IIb-IIIa of platelet was targeted by immunoglobulin G autoantibody which is confirmed significantly by elevated CRP levels in ITP patients (5,6). These autoantibodies are recognized in 40-60% of patients and provide condition to Kupffer cells and splenic macrophages in the liver phagocytosis platelets (7). Other mechanisms include impaired production of platelet stimulatory hormone, thrombopoietin, reduced expression of human leukocyte antigen-G and immunoglobulinlike transcripts or secondary contributors such as childhood exposure to viruses, helicobacter pylori infection, and pregnancy (8)(9)(10). Zhang, et al., determined six marker proteins that separate primary ITP from secondary ITP, including NPS, EDN1, CORT, CLEC7A, CCL18, and NPPB. Most of the detected proteins related to the immune system act as up/down-regulator in macrophages and platelet (11). Platelets can be recognized with the expression of CD38 as a prognostic marker for ITP (2).
As mentioned before, ITP classified as acute and chronic types and sub-categorized by primary and secondary etiology (9,10). Besides, the alternative classification by international consensus guidelines organized 3 phases as newly diagnosed (up to 3 months), persistent (3-12 months' duration), and chronic (over 12 months' duration) (2, 6 and 11). ITP patients were characterized by a decrease in platelet count of peripheral blood and variable bleeding symptoms. In severe cases, it may lead to fatal intracranial hemorrhage. Thus, prompt diagnosis and early therapeutic intervention are essential (12,13).
There are no specific criteria for diagnosing ITP, and diagnosis is based on the exclusion criteria of the other diseases, such as lupus erythematosus, Von Willebrand disease type IIb, hemolytic uremic syndrome, Evans syndrome, disseminated intravascular coagulation, Posttransfusion purpura, paroxysmal nocturnal hemoglobinuria, myelodysplastic syndrome, lymphoproliferative disorders, Infections (viral, bacterial, parasitic), and drug-induced thrombocytopenia. Furthermore, antiplatelet antibody testing is not recommended because of high inter-laboratory variability and reduced sensitivity (14)(15)(16).
Microarray technology is a prevalent technique for studying the pattern of expression of a large number of genes to analyze a genome. Microarray data are important in many aspects of disease research, including primary research, target discovery, biomarker identification, and prognostic test determination. The methods used to analyze the data can have a profound effect on the interpretation of the results (17,18). Network analysis of high-throughput data can be useful in breaking the gap between data production and drug targeting and helps to uncover biological complexity (19,20). Therefore, to explore the molecular mechanism and discover specific biomarkers for chronic ITP compared with newly diagnosed in pediatrics, we applied bioinformatics techniques to analyze gene expression profiles of pediatric chronic ITP versus newly diagnosed and identify DEGs. For this aim, in the beginning, pediatric chronic ITP patients' gene expression profiles were compared with pediatric newly diagnosed downloaded from GEO dataset. DEGs were identified using limma packages of the R software. The involvement of DEGs in the biological processes (BP), cellular components (CC), molecular functions (MF), and Kyoto Encyclopedia of Genes and Genomes (KEGG) were assessed with DAVID online tool. DEGs visualized using Cytoscape software. We applied the network analysis using Cytoscape to predict probable biomarkers. The panther database was used for transcriptional regulatory network construction. These studies could help find crucial genes that might be applied for appropriate diagnostics and treatment strategies in ITP.

Microarrays data
The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) is a public dataset for storage microarray, and nextgeneration sequencing data is freely available for users. In this study, ITP genomic data were obtained from GEO with the series accession number GSE46922 and the platform GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). This dataset included data from thirteen blood samples: seven newly diagnosed and six chronic samples described by Margareta Jernas et al. (21).

Data processing
Data retrieved from GEO were analyzed with R software to discover significantly expressed genes by employing various statistical tests, mainly, the t-statistics and P-value. R software is a fascinating tool to discriminate two or more groups of samples to classify genes, differentially regulated following the same experimental condition. This software can estimate the P-value for significant outcomes by utilizing Limma R packages from the Bioconductor project. Benjamini false discovery rate was concerned in this outline. Here the genes were chosen for more evaluation with P-value < 0.05, and-0.5 >M > 0.5 (M is log2 fold change).

Functional and pathway enrichment analysis
Up-regulated and down-regulated genes were analyzed separately by the DAVID enrichment database (version 6.8) (https:// david.ncifcrf.gov).
The Database for Annotation, Visualization, and Integrated Discovery (DAVID) is a web-accessible program that provides a comprehensive set of functional annotation tools to disclose the biological meaning behind gene sets. DAVID contains numerous public sources of protein and gene annotation from more than 65,000 species (22). Gene Ontology and KEGG pathway analysis were performed using the DAVID database for functional analysis of the gene lists. We used the functional annotation clustering; to reveal the clusters enriched in gene ontology and KEGG pathway terms with the enrichment score number. Gene Ontology (GO; www.geneontology.org) and the Kyoto Encyclopedia of Genes and Genomes (KEGG; www.genome.ad.jp/KEGG) enrichment analysis were performed to identify DEGs. GO was used for categorization, including biological process, molecular function, and cellular component, which is widely used in bioinformatics and increases the possibility of indentifying the most correlative mechanisms. KEGG was used for understanding the most relevant pathway of informative genes.

Network construction and modules selection
DEGs interaction network can clarify the molecular mechanism of cellular processing. Functional interaction between DEGs was constructed with Cytoscape (version 3.5.1). In this study, the network was extended with the Cytoscape public database. Highly connected nodes were selected as hubs. Some nodes with the highest betweenness centrality were nominated as bottleneck nodes. Then, Molecular Complex Detection (MCODE) was used for screening modules. The functional enrichment analysis of DEGs in each module was performed by DAVID.

Transcriptional regulatory network construction
In order to identify the transcription factor (TF) nodes in the network, the PANTHER Classification System database (http:// www.pantherdb.org/) was used (23). Then the transcriptional regulatory network was visualized by Cytoscape.

Data screening
Based on P-value < 0.05 in comparison to newly diagnosed ITP/chronic ITP, 132 DEGs were identified, consisting of 78 up-regulated (Supplementary Table S1) and 54 downregulated genes (Supplementary Table S2). As shown in (Figure 1), the medians located at the same level after performing data normalization with R software, indicating a perfect effect.

Gene ontology and pathway enrichment analysis
Gene Ontology and Pathway functional enrichment analysis were performed according to the P-values < 0.05 on the identified DEG. The enriched term of BP for up-regulated genes was reported in (Table 1 and Supplementary  Table S3). They were significantly involved in 10 significant clusters of biological processes, associated with regulation of autophagy, cell cycle checkpoint, regulation of gene expression, cellular component organization or biogenesis, positive regulation of cell projection organization, macromolecule metabolic process, sister chromatid segregation, the establishment of protein localization to the membrane, and regulation of protein tyrosine kinase activity. The upregulated genes were located in 5 clusters associated with the intracellular part, organelle, nucleus, intracellular non-membrane-bounded organelle, chromosome, centromeric region, and ciliary membrane (Supplementary Table S4). Significant Molecular function represented in (Supplementary Table S5) involved 2 cluster link to nucleic acid binding and protein kinase activity.
Moreover, the gene ontology related to biological process terms were overrepresented in down-regulated DEGs with significant P-value which mainly involved in the modification of morphology or physiology of other organism involved in symbiotic interaction, cellular response to monosaccharide stimulus, programmed cell death, histone methylation, negative regulation of sequence-specific DNA binding transcription factor activity, and positive regulation of binding (Table 2 and  Supplementary Table S6).
The significant cellular components, related to down-regulated genes contained four clusters that were mainly involved in intracellular, organelle part, membranebounded organelle, lysosomal membrane, lytic vacuole membrane, and chromosomal part.
(Supplementary Table S7). Two significant molecular function clusters for down-regulated genes, represented in (Supplementary Table  S8), were related to lipase activity, hydrolase activity, acting on ester bonds, and structurespecific DNA binding.
The significant pathway represented in (Table 3) for up-regulated and down-regulated DEGs .the pathway enrichment analysis for up-regulated DEGs indicated these genes involved in the ErbB signaling pathway, mRNA surveillance pathway, and the Estrogen signaling pathway. Whereas, only the Notch signaling pathway is related to down-regulated genes.   Table 1. Gene ontology enrichment analysis based on biological analysis of up-regulated DEGs. They were selected with significant value P < 0.05. Enrichment Score is related the type of analysis in the DAVID database selecting the"functional annotation clustering" for analysis of gene lists.

Category
Term  Table 1. Gene ontology enrichment analysis based on biological analysis of up-regulated DEGs. They were selected with significant value P < 0.05. Enrichment Score is related the type of analysis in the DAVID database selecting the"functional annotation clustering" for analysis of gene lists.

Network construction and modules selection
Based on public databases existing on Cytoscape, the PPI network of DEGs was established. Network analysis was shown consisting of 1137 nodes and 2647 edges ( Figure.2A). The cut-off criterion of hub gene selection was set at ≥ 40 degrees. Based on this cut-off, twenty hubs are recognized in the network. Therefore, regarding the cutoff criteria, seventeen genes of DEGs were selected as hub nodes. They consisted of eleven up-regulated (ATF2, VIM, PAK2, SF1, BUB3, PCF11, PCF12, FBXW7, GSPT1, CLIP1, ABL2) and six down-regulated (ARRB1, KPNA2, GRK5, TUFM, RHOG, TEX264) genes ( Table 4).
The functional modules were assessed using the MCODE plugin. Five modules were identified including thirty nodes and 36 edges We found six DEGs existing in both hub genes and modules, which have significant P-value (P-value < 0.05) for enriched BP. These genes included three up-regulated genes and three down-regulated genes ( Table 6).

Transcriptional regulatory network construction:
One hundred twenty-one nodes with TF function have been identified from 1137 network nodes using the panther database. By using Cytoscape, these 121 nodes have been visualized in a regulatory network.
Further analysis of potentially remarkable modules was performed by detecting TFs with a high degree of connections with other nodes, the so-called hub-TFs.
It should be noted that SF1 is the hub node in the transcriptional regulatory network. SF1, as a TF encoding gene is a hub-TF ( Figure.3).  Table 2. Gene ontology enrichment analysis based on biological analysis of down-regulated DEGs. They were selected with a significant value P < 0.05. Enrichment Score is related the type of analysis in the DAVID database, selecting the "functional annotation clustering" for analysis of gene lists.

Module1
Module2 Module3 Module4 Module5   Table 3. KEGG Pathway enrichment analysis of up-regulated and down-regulated DEGs. They were selected with a significant value P < 0.05 in the DAVID database.  Table 4. Hub gene with the cut-off criterion degrees ≥ 40. Three genes did not excist in DEGs and were added by Cytoscape software. Bottleneck genes showed by star in the betweenness centrality column.  color that contains the nodes are hub (red node), seed (green node) and TF (yellow and red triangle). There is just one red rectangle in module No.2 related to the node that is hub-seed gene.
Red triangle related to the nodes are hub-TFs and green triangle is a node related seed-TFs. SF1 and ATF2 are hub-TFs and ZNF382 is a seed-TF. ZNF382 and SF1 are the members of modules No.1 and No.2 respectively.  Table 5. The genes with the highest betweenness centrality were selected as the bottleneck. Five genes did not excisted among the DEGs and were added by cytoscape software. Stars in degree column show the importance of these gnes as hub genes.

Discussion
Diminished platelet production and enhanced platelet destruction are the familiar characters of ITP (24). However, the first hit for dysregulation of the immune system in ITP remains unknown (25). Understanding the molecular and physiopathological mechanisms of ITP requires many efforts to design new preventive and therapeutic strategies. Due to the interaction of genes and environmental factors in common human diseases, a more integrated biological approach is needed to solve these complexities (26). DNA microarrays are used as a powerful technique in biomedical research. This method has attracted much attention from scientists because of its ability to identify thousands of genes and even the entire genome simultaneously (26). Systemic network analysis of high-throughput data is the most useful technique to explain the important implications of life science. Network features, such as composition and topology are highly relevant to vital cellular functions, so they are critical in biological science research (27). This study tries to find essential genes and mechanisms by bioinformatics analysis of GSE46922 microarray data, which are Table 6. Key genes related to chronic ITP that selected based on multiple criteria of data analysis. Hub gene with the cut-off criterion degrees ≥ 40 which are also existed in modules selected as potential biomarkers for chronic ITP. The fold change in expressed genes in microarray selected based on M index that is log2 fold change.  Table 6. Key genes related to chronic ITP that selected based on multiple criteria of data analysis. Hub gene with the cut-off criterion degrees ≥ 40 which are also existed in modules selected as potential biomarkers for chronic ITP. The fold change in expressed genes in microarray selected based on M index that is log2 fold change.
different between the newly diagnosed and the chronic ITP. This study identifies, 131 DEGs, consisting of 78 up-regulated genes and 53 down-regulated genes, which are differentially expressed between, the newly diagnosed ITP and chronic ITP-. Our enrichment analysis of the upregulated DEGs showed that autophagy played a significant role in ITP. There is evidence that the positive regulation of autophagy is the most relevant biological process in ITP associated with the expressed genes in the chronic phase. Autophagy induces to the maintenance of platelet life and physiological functions (28). Improper expression of molecules in the autophagy pathway has been also determined in ITP patients lymphocytes (29). Elevating platelet autophagy has been also shown to diminish platelet destruction by prohibiting apoptosis and amending platelet viability (28). Besides, particular evidence implied that megakaryocytes undergo autophagy in ITP patients (30). The apoptotic process was diminished in accordance with activate autophagy process in chronic ITP.
Our study has shown that down-regulated genes in the chronic phase were mainly enriched in the Notch signaling, closely related to hematopoiesis, which involves the evolving hematopoietic system to generate hematopoietic stem cells and the development of immune cells like in T-cells or progress several autoimmune diseases like ITP (32). Rania Mohsen Gawdat et al. found the correlation of Notch1/Hes1 gene expression levels in Egyptian paediatric patients with newly diagnosed and persistent primary ITP (31,32). We detected this pathway in newly diagnosed ITP while down-regulated in the chronic phase, and this data has shown that the Notch pathway is replaced by the ErbB signaling pathway, mRNA surveillance pathway, and Estrogen signaling pathway over time to display the chronic phase symptom. Also molecular crosstalk among Notch signaling pthway with ErbB and Estrogen signaling pathways was acknowledged in breast cancer (33). This study also confirms the crosstalk between emerging ErbB and Estrogen pathway and inhibition of the Notch signaling pathway in ITP. The mRNA surveillance pathway was enriched by the up-regulated genes related to the quality control mechanism that targets aberrant mRNAs for degradation (34). This pathway was not reported for ITP but confirm this mechanism in autoimmune disease and cellular defense against virus invasion. Mutations affecting the mRNA surveillance machinery cause chronic activation of defense programs, resulting in autoimmune phenotypes. The Systemic lupus erythematosus (SLE) as a human autoinflammatory and autoimmune disorders are notably linked to this system deviation (34). ITP manifests several symptoms of mimicking diseases like SLE; therefore, one might be aware of this similarity emphasizing with several investigations. Besides, this pathway enriched from down-regulated genes in the chronic phase; it implies that the chronic phase of ITP can be due to perturbations in the pathways.
The network analysis also demonstrated that there are interactions among the DEGs.
Our network analysis revealed a set of candidate genes (three up-regulated and three down-regulated) for the investigation of biomarkers or molecular mechanisms of ITP, which was significantly correlated with chronic ITP, including BUB3, GRK5, SF1, VIM, ARRB1, and RHOG.
Our network analysis also verifies the Notch signaling pathway in ITP. In this study, ARRB1 was considered a hub-bottleneck protein with a high degree and high betweenness centrality value. This protein is strongly related to the Notch signaling pathway. Due to its unique features, it has an attractive advantage for drug targeting.
One of the essential genes that play an indispensable role in the maturation of hematopoietic precursors is Vimentin (VIM) that belongs to hub-bottleneck protein.
Alteration in expression of VIM has been recognized in the maturation process of the megakaryocytic, granulomonocytic, erythroid, and lymphoid lineages (35). Up-regulated VIM has been also shown in the formation of fully active macrophage-like cells and macrophage polykaryons (36). Rho GTPases (RhoG) is one of the crucial members of our analysis, which has a central regulatory role in platelet production and megakaryocyte maturation (37).
One of the most important genes in this research was SF1. In addition to being a hub, integrating TF's expression data into Cytoscape indicated that SF1 is also a TF. Kenichi Yoshida et al. reported that there is a mutation in SF1 in hematologic malignancies, but its frequency was not at confidence level for presentation to clinical associations (38).
The G-protein-coupled receptor kinase 5 (GRK5) is a critical member of the threonine/ serine kinase family that phosphorylates and regulates the G-protein-coupled receptor (GPCR) signaling pathway. GRK5 has a key role in several diseases; for example, GRK5 is a decisive pathogenic factor in early Alzheimer's disease, hepatic steatosis and metabolic disorders such as type II diabetes and obesity, injured and failing heart and cancer (39)(40)(41)(42)(43)(44). GRK5 also has multiple roles in TLR (Toll-Like Receptor) signaling, which were described as a family of receptors involved in recognizing pathogen-associated molecular patterns (PAMPs) derived from microbes. Moreover, the importance of TLRs has been identified in several inflammatory diseases, including non-infectious diseases (45,46). In addition, detection of GRK5 expression provides a target for determining the effectiveness of drugs and determining patient prognosis in cancer (47).
The BUB3 is one of the mitotic checkpoint proteins specified by a group of evolutionarily conserved genes. It is believed that the failure of the BUB gene family as a surveillance system is a critical components of the regulatory process which causes genomic instability. This gene family encodes proteins that are a part of a large multi-protein kinetochore complex (48,49). The BUB3's importance was found in colorectal cancer at a young age and in lowgrade breast cancers (50,51).
The use of omics technology to identify the mechanism of disease and the discovery of biomarkers has received much attention in recent years. Microarray and proteomics approaches can help to solve biological complexities by creating an extensive list of expressed transcripts that are simultaneously (52). As mentioned in the introduction, Zheng and his colleagues were able to introduce six important markers for the diagnosis of ITP by using Proteomics technology in 2016 (11). However, they have not yet been used in the clinic. Our study using microarray data analysis introduces six new markers that can clarify the pathogenesis of the ITP and need many examinations for clinic application.

Conclusion
The current study has obtained DEGs using comprehensive bioinformatics analysis of high-throughput data released from microarray analysis to find the possible biomarkers. In summary, a total of 132 DEGs were screened, and six genes, including BUB3, GRK5, SF1, VIM, ARRB1, and RHOG, previously have not been reported as signature genes in ITP; here we found that they might play critical roles in chronic ITP. This research contributes new insights into the molecular mechanisms of newly diagnosed ITP and chronic ITP. These six genes together could be considered as a panel of biomarkers to differentiate newly from chronic ITP. Thus, additional investigations are needed to focus on the clinical application of these genes.